Learning and Evaluation in the Presence of Class Hierarchies: Application to Text Categorization
نویسندگان
چکیده
This paper deals with categorization tasks where categories are partially ordered to form a hierarchy. First, it introduces the notion of consistent classification which takes into account the semantics of a class hierarchy. Then, it presents a novel global hierarchical approach that produces consistent classification. This algorithm with AdaBoost as the underlying learning procedure significantly outperforms the corresponding “flat” approach, i.e. the approach that does not take into account the hierarchical information. In addition, the proposed algorithm surpasses the hierarchical local top-down approach on many synthetic and real tasks. For evaluation purposes, we use a novel hierarchical evaluation measure that has some attractive properties: it is simple, requires no parameter tuning, gives credit to partially correct classification and discriminates errors by both distance and depth in a class hierarchy.
منابع مشابه
Hierarchical Text Categorization and Its Application to Bioinformatics
In a hierarchical categorization problem, categories are partially ordered to form a hierarchy. In this dissertation, we explore two main aspects of hierarchical categorization: learning algorithms and performance evaluation. We introduce the notion of consistent hierarchical classification that makes classification results more comprehensible and easily interpretable for end-users. Among the p...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملHierarchical vs. flat n-gram-based text categorization: Can we do better?
Hierarchical text categorization (HTC) refers to assigning a text document to one or more most suitable categories from a hierarchical category space. In this paper we present two HTC techniques based on kNN and SVM machine learning techniques for categorization process and byte n-gram based document representation. They are fully language independent and do not require any text preprocessing s...
متن کاملEnhancement of Learning Based Image Matting Method with Different Background/Foreground Weights
The problem of accurate foreground estimation in images is called Image Matting. In image matting methods, a map is used as learning data, which is produced by those pixels that are definitely foreground, definitely background ,and unknown. This three-level pixel map is often referred to as a trimap, which is produced manually in alpha matte datasets. The true class of unknown pixels will be es...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006